stochastic latent actor-critic
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
Deep reinforcement learning (RL) algorithms can use high-capacity deep networks to learn directly from image observations. However, these high-dimensional observation spaces present a number of challenges in practice, since the policy must now solve two problems: representation learning and task learning. In this work, we tackle these two problems separately, by explicitly learning latent representations that can accelerate reinforcement learning from images. We propose the stochastic latent actor-critic (SLAC) algorithm: a sample-efficient and high-performing RL algorithm for learning policies for complex continuous control tasks directly from high-dimensional image inputs. SLAC provides a novel and principled approach for unifying stochastic sequential models and RL into a single method, by learning a compact latent representation and then performing RL in the model's learned latent space. Our experimental evaluation demonstrates that our method outperforms both model-free and model-based alternatives in terms of final performance and sample efficiency, on a range of difficult image-based control tasks. Our code and videos of our results are available at our website.
Review for NeurIPS paper: Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
Weaknesses: - The paper's narrative is based around POMDPs, but the experimental evaluation does not really stress the capability of the method in that respect. Evaluation is done on pixel-based control, which is PO of course, but we have know that a lagged observation of a few time-steps can make the state fully observable quickly. Hence, we do not know how the method fares in environments where the state uncertainty has to be actively reduced by the agent. Therefore I think the paper overstates the results. It is easy to get out of this, however, since one can just drop the POMDP claim. For me personally (and the optimal control community) it is obvious that we want some kind of state estimation when we use control, as most–if not all–practical problems are PO.
Review for NeurIPS paper: Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
The method targets a model-based approach to solve POMDPs with high-dimensional observation spaces. This problem is tackle by learning jointly about the dynamics of the POMDP and the optimal policy by maximum likelihood using an "RL as inference" type objective. In more detail, the latent space transitions are predicted by an inference model that is trained to maximise an evidence lower bound. The reviewers are mostly positive about the paper. They mention the theoretical soundness of the approach and the quality of writing as well as the empirical set-up and usefulness of the ablations.
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
Deep reinforcement learning (RL) algorithms can use high-capacity deep networks to learn directly from image observations. However, these high-dimensional observation spaces present a number of challenges in practice, since the policy must now solve two problems: representation learning and task learning. In this work, we tackle these two problems separately, by explicitly learning latent representations that can accelerate reinforcement learning from images. We propose the stochastic latent actor-critic (SLAC) algorithm: a sample-efficient and high-performing RL algorithm for learning policies for complex continuous control tasks directly from high-dimensional image inputs. SLAC provides a novel and principled approach for unifying stochastic sequential models and RL into a single method, by learning a compact latent representation and then performing RL in the model's learned latent space.